Clean up inactive iscsi sessions when VMs get moved due to crashes#3819
Conversation
|
Hm - when does this happen exactly @skattoju4 |
|
I will add details. It’s for kvm and an cleanup of sessions they are not removed if move an vm from one host to another, crash of an vm and host crash. This fixes the sessions they are there but should removed. This is not good for managed storage or solidfire.
This fix is tested with us in our production environment.
…__
Sven Vogel
Teamlead Platform
EWERK DIGITAL GmbH
Brühl 24, D-04109 Leipzig
P +49 341 42649 - 99
F +49 341 42649 - 98
S.Vogel@ewerk.com
www.ewerk.com
Geschäftsführer:
Dr. Erik Wende, Hendrik Schubert, Frank Richter
Registergericht: Leipzig HRB 9065
Support:
+49 341 42649 555
Zertifiziert nach:
ISO/IEC 27001:2013
DIN EN ISO 9001:2015
DIN ISO/IEC 20000-1:2011
ISAE 3402 Typ II Assessed
EWERK-Blog<https://blog.ewerk.com/> | LinkedIn<https://www.linkedin.com/company/ewerk-group> | Xing<https://www.xing.com/company/ewerk> | Twitter<https://twitter.com/EWERK_Group> | Facebook<https://de-de.facebook.com/EWERK.IT/>
E-World 2020 in Essen.
Das EWERK ist dabei! Treffen Sie uns vom 11-13.02.20 in Halle 5 am Stand: 5-724, mit spannenden Vorträgen rund um das Thema Urban Data.
Mit Handelsregistereintragung vom 09.07.2019 ist die EWERK RZ GmbH auf die EWERK IT GmbH verschmolzen und firmiert nun gemeinsam unter dem Namen: EWERK DIGITAL GmbH, für weitere Informationen klicken Sie hier<https://www.ewerk.com/ewerkdigital>.
Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
Disclaimer Privacy:
Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien) ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung, Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System. Vielen Dank.
The contents of this e-mail (including any attachments) are confidential and may be legally privileged. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system. Thank you.
Am 17.01.2020 um 19:15 schrieb Andrija Panic <notifications@github.com>:
Hm - when does this happen exactly @skattoju4<https://github.com/skattoju4>
I recall the Mike T. implemented a fix for the same when VMs are being stopped inside the OS (originally iSCSI session would not be removed)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#3819?email_source=notifications&email_token=ABJOT5E2F2KNP6A6KOYB2GLQ6HYSJA5CNFSM4KIKA52KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIQXNY#issuecomment-575736759>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABJOT5GA7WW7SIKAO7SNWOLQ6HYSJANCNFSM4KIKA52A>.
|
|
@syed @skattoju4 thanks for your work! we tested it in our production environment and it works fine. |
|
@syed can you review please? |
|
LGTM 👍 |
|
apart from the license, do you need this in 4.13? |
|
@DaanHoogland for me we dont need it in 4.13. do you see any thing that we need it? |
|
@svenvogel you marked it a bug, that's why i asked, nothing else ... ;) |
GabrielBrascher
left a comment
There was a problem hiding this comment.
Thanks for the PR @skattoju4. I left a question and a minor observation. Overall it looks good (+0.99 🙂 )
|
|
||
| public class IscsiStorageCleanupMonitor implements Runnable{ | ||
| private static final Logger s_logger = Logger.getLogger(IscsiStorageCleanupMonitor.class); | ||
| private static final int CLEANUP_INTERVAL_SEC = 60; // check every X seconds |
There was a problem hiding this comment.
I am OK with the way it is, but I would like to raise the following question: Is it interesting to externalize CLEANUP_INTERVAL_SEC this on a global settings variable (config keys)?
| diskStatusMap.put(disk.getDiskPath(), true); | ||
| s_logger.debug("active disk found by cleanup thread" + disk.getDiskPath()); | ||
| } | ||
| } |
There was a problem hiding this comment.
I would recommend creating a few methods (e.g. checkIfIscsSessionBelongAnyVm(Connect), etc) which would allow documenting (Javadoc) and creating unit tests (JUnit).
|
@DaanHoogland @svenvogel @andrijapanicsb considering that it is a bug, it would be awesome to have it in 4.13 branch (aiming 4.13.1 release) and then forward it to master (aiming 4.14). |
|
@GabrielBrascher @DaanHoogland @andrijapanicsb Gabriel - you are right. we should move it to the 4.13 branch and then forward it to 4.14 👍 @skattoju4 can you change this? |
cc3402f to
a750ce0
Compare
a750ce0 to
932a98b
Compare
| // check if they belong to any VM | ||
| int[] domains = conn.listDomains(); | ||
| s_logger.debug(String.format("found %d domains", domains.length)); | ||
| for (int domId : domains) { | ||
| Domain dm = conn.domainLookupByID(domId); | ||
| final String domXml = dm.getXMLDesc(0); | ||
| final LibvirtDomainXMLParser parser = new LibvirtDomainXMLParser(); | ||
| parser.parseDomainXML(domXml); | ||
| List<LibvirtVMDef.DiskDef> disks = parser.getDisks(); | ||
|
|
||
| //check the volume map. If an entry exists change the status to True | ||
| for (final LibvirtVMDef.DiskDef disk : disks) { | ||
| if (diskStatusMap.containsKey(disk.getDiskPath())) { | ||
| diskStatusMap.put(disk.getDiskPath(), true); | ||
| s_logger.debug("active disk found by cleanup thread" + disk.getDiskPath()); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
checkDiskStatusMap();
| for (final LibvirtVMDef.DiskDef disk : disks) { | ||
| if (diskStatusMap.containsKey(disk.getDiskPath())) { | ||
| diskStatusMap.put(disk.getDiskPath(), true); | ||
| s_logger.debug("active disk found by cleanup thread" + disk.getDiskPath()); |
There was a problem hiding this comment.
space needed in log message before the path.
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
disconnectPhysicalDisks();
DaanHoogland
left a comment
There was a problem hiding this comment.
no real issues in the code, but i would like to see some more partitioning of the logic in methods.
|
thanks for the changes @skattoju3 . looks a bit cleaner. |
|
@blueorangutan package |
|
@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✖centos6 ✔centos7 ✔debian. JID-714 |
|
@blueorangutan test |
|
@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-853)
|
Thanks for the feedback :) |
|
Would this fix cause a problem with Hypervisors that use iSCSI as its system disk? It seems that when iscsiStorageCleanupMonitor runs, it kills the iscsi connection for the hypervisor iscsi system disk and causes the hypervisor system disk to go into a read only state. Is there a way to stop this running - I dont use iSCSI for KVM guests |
|
Yes this seems to be the case. It is currently a limitation. Iscsi disks that are not attached to VMs will be marked as inactive and attempted to be cleaned up. One solution could be to have this feature be toggle-able by a global setting. |
|
Thanks for the update. If I just use the agent version 4.11, will that not perform the iSCSI cleanup and the HVs can work normally again or is this a management server thing? |
|
It's part of the agent. Using an older agent should do the trick for now. |
Description
Previously, iscsi sessions would not be cleaned up on KVM when
This leads to to long boot times from the hypervisor host and session overloading of the netapp solidfire nodes. Each node can have a maximum of 700 iSCSI sessions. The iSCSI Daemon on the hypervisor host holds iSCSI sessions that are not cleaned up. The iSCSI Daemon tried to reconnect every time resulting in holding the unused sessions. Only a logout of the iSCSI daemon to the volume will solve the problem. This changes adds a kvm.storage.IscsiStorageCleanupMonitor class that scans for and cleans up inactive iscsi sessions.
steps to reproduce:
ìscsiadm -m session -P 0virsh destroy vmname)ìscsiadm -m session -P 0and you will see sessions they should not be thereTypes of changes
Screenshots (if appropriate):
How Has This Been Tested?
This has been tested by observing that inactive iscsi sessions are cleaned up when VMs are removed.